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We show that it is feasible to formulate the testing migration problem as a 
practically solvable PMAX-SAT instance, when package dependencies and 
conflicts are pre-processed sensibly. 



1 Introduction 

The management of software repositories such as those of Free Software distributions 
(Debian, Fedora,. . . ) or plugin sets (Eclipse, Firefox) pose a number of interesting prob- 
lems, due to dependencies and conflicts between the individual software units. 

The problem discussed in this paper is that of the testing migration that arises when 
preparing a release: Given a repository, containing the software that is ready to be 
released, and a set of newly created software, which of these may be added to the 
repository such that certain requirements, especially the installability of every package, 
are preserved. A formal definition of the problem follows in 12.21 

Previously, only questions related to installability have been tackled with formal meth- 
ods ([MBCIO6]. . . ), such as which packages from a fixed repository are installable, and 
which packages should installed when upgrading a system. Our problem is related, but 
more difficult, because the installability test has to be applied to many possible choices 
of updated packages. Thi s also implies that our problem at hand is A/'P-hard, as testing 
package installability is I BurOSll . 

The key idea in this paper is to reduce the size of the naive but unreasonably large SAT 
instance by pre-processing of the interaction of dependencies and conflicts between the 
packages, while still leaving the hard part of the SAT solving to a general purpose SAT 
solver. 



* e-mail: breitner@k it. edul 
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This approach has been implemented by the author and is deployed by the Debian 
project, to assist their existing, incomplement testing migration implementation. The 
main contributions of this paper are these: 

• A formal description of the testing migration problem, adaptable to vaious con- 
cret applications. 

• An implementation of the testing migration problem as a PMAX-SAT instance. 

• Preprocessing steps that make this implementation practable, with correctness 
proofs. 

• An implementation of the solution, empirically verifying its practicability and 
usefulness. 

2 Background 

The setting of the testing migration problem is ver y similar t o that of the dependency 
solving problem, so we extend the formalization in iMBC+06ll to two repositories. 

2.1 Repositories 

The units of our problem are packages. For these, we have an abstract set J\f of names 
and a totally ordered set V of versions. A package is a tuple of a name and a version, 
and 13 J\f X V the set of all packages. For a more realistical specification of the 
migration problem, the packages also need to carry an architecture such as 1386, amd64 
or arml; but these does not affect the approach described in this paper, so we ignore 
this aspect here. 

The packages are related by a dependency function D: B —> V{V{B)) and the conflicts 
relation C Q B x B. The intended meaning of D is that if {pi, . . .pn} D{p), then 
one of the p, has to be installed on a system if p is to be installed. It is possible to 
have G in that case the package cannot be installed. We assume here that the 

dependencies are already expanded: In practice, dependencies are given by a package 
names and version ranges. Replacing such a construct by the disjunction of all existing 
packages satisfying the criteria gives our dependency function, this is called dependency 
expansion in liMBC+061 . 

The conflicts relation is symmetric. In contrast to (MBC+06'] we do not require C to 
contain all pairs of packages with same name but different version number, but keep 
this relation separate: 

Cm := {(P1.P2) eBxB \ 7ri(pi) = 712(^2) A n2{pi) ^ 712(^2)} 
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A repository R C S is a set of packages. An installation I C R of a repository R is 
a selection of packages. We call the installation healthy if all dependency and conflict 
relations are fulfilled: 

yp e I-.yd e D{p): dni ^(Z) and IxlnC = 0. 

A package p G R is installable in R if there exists a healthy installation ICR with p G I. 
A repository R is called trimmed if all its packages are installable in it. 

2.2 The testing migration problem 

For the formalization of the testing migration problem, we consider two repositories 
T C. B and LZ C ^, dubbed "testing" and "unstable". We assume that these are all 
packages that we need to worry about, e.g. B = T U U. A migration is then a modified 
testing repository T' such that various requirements are fulfilled. These are in practice 
currently implementation-defined, the closest to a specification is in given informally 
in the comments in the current implementation, britney2 . pjl^. Here, we treat these 
validness requirements abstractly: 

1. (Uniqueness) A package name occurs at most once: Cu (IT' x T' = ®. 

2. (Trimmedness) The repository T' is trimmed. 

3. (Validness) Further requirements, such as that all binaries from a certain source 
package migrate from T to T' together or not at all, or that binaries where newer 
versions exist can only remain in T' if they are from a certain section. For the pur- 
poses of this paper we just assume that T' = T is always valid and that the rules 
can be straightforwardly formulated as a SAT instances as described in section 

A choice of T' is now called admissible if all three requirements are fulfilled. We assume 
T to be trimmed and contain every binary at most once. Therefore, T' = T is always 
admissible, this is the trivial migration. A migration is measured by the size of symmet- 
ric difference of T and T', and generally we are interested in a largest migration; but it 
is also of interest to find the smallest migration containing a fixed p G LI C T. 

2.3 SAT and PMAX-SAT 

Our approach to this problem is to formulate the problem as a boolean satisfiability 
problem (SAT), such that a solution to that problem is guaranteed to represent an ad- 
missible migration. Furthermore, if the solver allows us to mark some clauses as desired, 
then we can find a largest migration; the problem then is an instance of PMAX-SAT. 



http://anonscm.debian.org/gitweb/?p=mirror/britney2.git;a=blob;f=britney.py;hb=HEAD 
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Formally, given a set V of atoms, an SAT instance P G V{V{V^) consists of sets of 
subsets (called clauses) of the set := V U of literals, where a literal is either an 
atom V or its formal negation v~. A solution of an instance P is a subset of atoms S cV, 
called true, such that each clause is fulfilled, i.e. has at least a true atom as a literal or a 
false atom as a negated literal: 

e S: ens cn{v\s)- ^0. 

We use {A B} as an abbreviation of the clause A~ UB, where A and B are either 
set of atoms or list of atoms understood as sets, and {v t ^} as an abbreviation of 
{v~,w~}. 

A PMAX-SAT instance P G V{V{V^)) x V{V{V^)) consits of two sets of clauses, the 
first set being the hard clauses and the second set being the soft clauses. Its solutions are 
those of the hard clauses, understood as a regular SAT problem, the quality of a solution 
S is measured by the number of fulfilled soft clauses: 

#{cG 7r2(P) I ens /0vcn(y\s)- 7^0}. 

For both SAT and PMAX-SAT, a variety of good general purpose solvers are avail- 
able. 



3 Encoding the testing migration problem as a SAT problem 

The main ideas of this paper can be found in this section, in which we will describe an 
encoding of the testing migration problem as a SAT instance. We give a series of differ- 
ent encodings, starting with an obvious one that is incapable of handling conflicts, then 
a naive, but prohibitively large encoding that handles conflicts, followed by further 
improvements to reduce the size of the instance. 

3.1 Encoding in absence of conflicts 

Assume first that there are no conflicts involved (C = 0). Then the testing migration 
problem can be straightforwardly cast into a SAT instance. We take the set of packages 
as the set of atoms {V = B) and define clauses that enforce the three conditions for an 
admissible migration: 

Pu ■■= {{pit Pi} I Pi,P2 e Cu} 

pi := {{p ^ d} \ p e B, d e D{p)} 

P.:= {...} 

Pi := P„ U P} U P, 

A migration T' C ;B is now admissible if and only if T' is a solution of P^. 
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Proof A solution T' of is an admissible migration: This is obvious for the Unique- 
ness requirement, as it is directly expressed in the clauses in Pu- The (here unspecified) 
validness is also enforced by a straightforward representation in P^,. Furthermore, T' 
is trimmed: Every package p G T' is installable, because J = T' is already a healthy 
installation; the dependencies are fulfilled by P}, and there are no conflicts. 

Conversely, an admissible migration T' fulfills P}: Consider a clause c = {p^}Ud G P}, 
arising from a package p E B and a dependency disjunction d E D{p). If p ^ T' , then 
c n (S \ r')^ = {p^ } 7^ 0- On the other hand, if p G T', then there exists an installation 
J C T' with pel and dDl ^0, hence d n T' 7^ 0. So the every clause in Pj is fulfilled.a 



This problem encoding is sufficiently small and fast, having one atom per package un- 
der consideration. Unfortunately, it cannot be directly extended to cater for conflicts: 
If we take {pi,pi) G C to imply that pi ^ T' V p2 ^ T', then we will disallow valid 
migrations, as conflicts affect just installations, not repositories. So do package depen- 
dencies at a first glance, but they are positive in the sense that adding more packages 
to an installation does not affect the installability of existing packages negatively. 

When applied to the real dataset that occurs in the migration of unstable to testing in 
DebiarE, this generates 263765 atoms and 1938652 clauses in Pf. 



3.2 Encoding with conflicts 

Now allow C 7^ 0. To represent the trimmedness of a repository directly as a SAT 
problem, we have to encode the search for an installation for each package. To that end, 
we take as atoms packags, as before, and additional atoms for each pair of packages, 
where we write such a pair as p@p, with the indented meaning of "p is in the installation 
for Pi": 

V = B[J {p@pi I p, Pi G 13} . 

We leave Pu and P-^ as before and define clauses that cater for trimmedness. 

Pe ■= {{p@Pi ^ P} I Pi^P e B} 
pf ■- {{p p@p] \ p E B} 

pj ■- {{p@pi {p'@pi I p' G d}} \ pieB, p eB, de D(p)} 

p2 := {{pi@p; t P2@P/} I {pirPi) e C, Pi G B) 
P2 := Pu U P2 U P„ 

A solution S of this SAT instance defines a admissible repository T' := S H B. 
^data from 2012-03-30, 11 architectures 
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Proof For each package p G T', define an installation Ip := {p' ^ B \ p'@p G S}. This 
installation contains p (by the clause {p — > p@p} G Pf); it is a subset of T by the clauses 
in P}; and it is healthy, as all dependencies of p' G Ip are fulfilled in Ip by the clauses in 
P| and no conflicting package can exist in Ip by the clauses P^. So T' is trimmed and, 
due to Pu and P-o as before, admissible. ■ 

Conversely, for every admissible migration T' there is a solution S of P^ such that T' = 

Proof Since T' is trimmed, we have for each package p G T' a healthy installation Ip] 
let S := T' U {p'@p \ p ^ T , p' ^ Ip} . This solution fulfills all new clauses above, as 
can be seen directly. The clauses in P„ and Pv are fulfilled as before. ■ 

So we found a faithful encoding of the problem as a SAT problem instance. But it 
is prohibitively large if we are indeed generating variables and clauses for each pair 
of packages; we would be requiring 69572238990 atoms and generating 511348544780 
clauses only in P|! 

3.3 Trimming the problem 

We will reduce the size of the instance by using the first approach (encoding the depen- 
dencies directy between the variables representing the packages) when possible and 
fall back to the previous expensive but complete approach when required. 

For that we need to introduce the "may depend" relation D, which is an approximation 
of D: 

D(p)=UD(p). 

We will most often work with its reflexive, transitive closure D* and say that D*{p) is 
the dependency closure of p. 



3.3.1 Only consider possible dependencies 

It is obvious that we created way to many variables and clauses in the second attempt, 
as the installability of p' is irrelevant when trying to find an installation for p if p ' ^ 



D*{p). We phrase this as an lemma, which follows from Proposition 1 in IMBC^OoI : 



Lemma 1 Ifpis installable in a repository R, then there is an installation I containing p such 
that I C D*{p). 
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Using this, we adjust the previous setup of the instance. The set of variables is now 
B U {p'@p \ p e B, p' e D*{p)} and the clauses are: 



P3 := {{p@pi -^pjlp^eB^pe D*{pi)} 

pf ■- {{p p@p} \ p e B} 

P| := {{p@pi ^ {p'@pi I p' G d}} \pieB,pe D*{pi), d G D{p)} 

Pc ■= {{pi@Pi t P2@Pi} I Pi e B,{pi,p2) G C|d*(p)} 

pf :=p3uPf UP|UP3 
P3 := Pu U pf U Py 

A solution S of P'^ is also a solution of P^ and hence defines a trimmed migration. 
Proof This follows from P^ C P^. ■ 

Conversely, a trimmed migration defines a solution S of P^ as it did for P^. 

Proof By Lemma [TJ we can choose the installation Ip of a package p G S fl ;B as a 
subset of D*(p). ■ 

This is already a considerable improvement over the naive approach in the last section, 
having only 36708835 atoms and 121591516 clauses to consider. 

3.3.2 Ignore always-installable packages 

The next step is to realize that some packages p have the nice property that if they 
are present in the repository, and installable on their own, then they can always be 
used to fulfill another packages dependency without worrying about p's dependencies. 
This is trivially the case if the package has no dependencies or conflicts, but also - 
less trivially - if the dependency closure of p does not take part in any conflicts. Let 
6 := {p E B \ D*{p) n TTi{C) = 0} be the set of these easy packages and Dh the (range 
and domain) restriction of D to the hard packages B\S. 

This allows us to further reduce the number of atoms and clauses, by adjusting the 
previous setup of the instance. The set of variables is now B U {p'@p \ p E B, p' E 



7 



D^{p)} and the clauses are: 





:= {{p@pi ^ p} 1 Pi e p e Mvi)} 


Pt 


:= {{p p@p} 1 p e ;B} 


'■d 


:= {{p@pi {p'@P; 1 p' G d \ £} U {p' 1 p' e d n £"}} 




|p, eS, peD^(p,), deD(p)} 




:= {{pi@pi t V2®Vi} 1 Pi e S.(pi/P2) e C|o.(p.)} 


pf 


:= U pf U P| U pf 


p4 


:= P„ U pf U 



Again, given a solution S of P^, we can construct a solution S' of P^. 

Proof We extend the installation of a package by all easy packages in its dependency 
closure: 

s' = s u {p'@p I p e s, p' G <S n D*(p) n s}. 

First note that an easy package p with an unfulfillable dependency e D(p) cannot be 
in S, due to Pf . Now let {p@pi — ^ {p'@Pi | p' G d}} be a clause in P| and p@p; G S' 
(otherwise the clause is trivially fulfilled). If p@p; ^ S, then pG^nS, so^t^© and 
there is a p' G d C D*(pj) and p'@p G S fulfills the clause. Now assume p@p; G S. If 
the corresponding clause in Pf is fullfiUed by p'@p, G S for p' G d \ this also fulfills 
the clause in P|. If the clause in Pf is fulfilled by a p' with p' & dOS, then p'@p/ is in S' 
by definition, so the clauses in P| is fulfilled. 

The clauses in P^ resp. P^ are either in Pf resp. Pf or have the negation of a p@p, with 
p i D^{pj) as a literal. As p@pi is neither in S nor one of the variables added to obtain 
S', the clauses are fulfilled. ■ 

Conversely, a solution S' of P^ gives rise to a solution S of P^ by intersecting it with the 
variables used in P^. 

Proof This holds because Pf C P^, Pf C P^ and clauses in Pf correspond to clauses 
in P| with occurrences of p'@p replaced by p', which cannot turn from fulfilled to un- 
fulfilled because of the clause {p'@p — )■ p'} G P^ . ■ 

This optimization reduces the size of the instance to 5235551 atoms and 31793397 clauses. 
3.3.3 Considering relevant conflicts 

The previous two refinements are subsumed by this refinement, where we identify for 
a package p which conflicts and dependencies are actually relevant for its installabil- 
ity, and test the installability the verbose way only for those dependencies linking a 
package with its relevant conflicts. 
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A conflict is relevant for p if both its ends are in the dependency closure of p. More 
formally we define Cr(p) := C1d*(p) =Cn(D*(p) x D*(p)). 

Now we find the connecting dependencies of a package p. These are those packages 
that are in the dependency closure of p and have one end of a relevant conflict in their 
own dependency closure: 

D,(p) := {p' G D*(p) I 7ri(Q(p)) n D*(p') ^ 0} U {p}. 

We artificially add p to the set to avoid special-casing this atom in P;; in practice, one 
would omit creating an atom p@p if Q(p) = 0. 

Now we can construct the following SAT instance: The set of variables is ;B U {p'@p | 
p ^ B, p' G Dr{p)} and the clauses are: 

p5 {{p@pi p} \ Pi e B, p e Dripi)} 
Pf ■- {{p p@p} I p G 5} 

P| := {{p@pi ^ {p'@pi I p' G d n D,(p,)} U {p' I p' G d \ Dripi)}} 

\pieB, pe Dr{pi), d G D(p)} 
p5 := {{pi@p,. t p2@Pi} I Pi G B, (pi,p2) G C|D^(p.)} 
pf := U pf U P| U pf 
P5 := P„ U pf U P^ 

Again we transform a solution of P^ into one of P^ and vice-versa. 

Proof Note that easy packages are never relevant dependencies, so Dr{p) Q DJ^{p). 
By a similar argument as before, intersecting a solution S of P^ with the variables used 
in P^ turns it into a solution of P^. 

For the other direction, let S be a solution of P^. Assume for this proof that the relation 
D is acyclic (otherwise, the proof is possible using fixed-point induction). For each 
package pi G S, define an installation recursively as 

7p, :={peB\ p@pi G S} U I p G Dr{pi), p' G D(p), p' G S, p' ^ C),(pO}. 

This installation is in the repository Sr\B prescribed by the solution: Packages from 

the first set are in S by the corresponding clause in Pf , those from the big union because 
p' G S and Ipi C S by induction. Furthermore, it contains p, because p, G S and the 
corresponding clause in Pf . All packages p G Ip. have their dependencies fulfilled; 
either because they come from an l^i and hence by induction, or because they come 
from a p@p, G S. Then P| ensures that each disjunction of dependencies d G D(p) 
contains either a p' e dCi Dr{pi) with p'@p, G S, hence p' G Jp,., or a p' G d \ Dr{pi) 
with p' G S, which would effect /p/ C Ip. and hence p' G /p; 
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To show that Ip. is healthy, it remains to show that no conflict occurs in Ip. . By the defi- 
nition of Ipi and Dr{pi), we can see that Ip^ C D*{pi). Assume now that c = (pi, P2) £ 
C n {Ip. X Ip.). Then c G and hence pi, p2 G Dr{pi) and pi@p, p2@p G S, which is 

a contradiction to the corresponding clause in ■ 

The numbers of this approach are 3276791 atoms and 21128454 clauses. 



4 Finding optimal solutions with PMAX-SAT 

Solving the SAT encoding described in the previous section will result in any of many 
possible solution, but not necessarily the best solution. Recall that migration from T 
to T' is measured by the size of the symmetric difference between T and T'. Usually, 
one is interested in the largest migration. To achieve that, soft clauses are added to the 
problem: 

P^^^ := {{v} I V G iJ\T}U{{z;-} \veT\U} 

Feeding these together with the hard clauses P from the previous section to a PMAX- 
SAT solver will find a solution that fulfills as many clauses from Pf^^" as possible; this 
number is exactly the symmetric difference between T and T'. 

Alternatively, one maybe be interested in a smallest non-trivial solution. In this case, a 
non-triviality clause P„f is added to the hard clauses, and the soft clauses are inverted: 

P„t ■■= {{v \ V e U\T} u {v- \ V e T \ U}} 
Pf^i^:={{z;-} \v eU\T}u{{v} \veT\U} 

If one is interested in a smallest migration of one particular package p EU\T, adding 
the unit clause {p} to the hard clauses and taking P™™ as the soft clauses will find such 
a migration, if it exists. If not, then extracting the minimal unsolvable core from the SAT 
solver provides an explanation as to why the package does not migrate, a very helpful 
feature. 



5 Implementation 

Our implementation is written in Haskell and has 2300 lines of code. It can read Pack- 
ages files as used by dpkg-based distributions (Debian, Ubuntu) and can generate, be- 
sides a description of final repository state, "hints" that can be fed to the currently used 
testing migration implementation. Therefore, it can improve the current setup with- 
out having to replac e it. To de tect packages that are not installable in the first place. 



edos-debcheck from ||MBC+06l| is used 
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To solve SAT instances and to generate minimal unsatisfiable cores, the free SAT solver 
picosat [Bie08] is used. To solve the PMAX-SAT instances, it supports clasp i gknsotIi 
and Sat4j [BPIO], the latter is used by default. We experimented with other solvers 
such as MiniMaxSat and MSUnCore as well, but these eliminated themselved by not 
being licensed under a Free Software license, a natural requirement for a project like 
Debian. 

The code is Free Software, licensed under the GPL, and can be obtained from the code 
repository at http://git.nomeata.de/?p=sat-britney.git , 



6 Related work 



We bu ild upon w ork done on the problem of testing the installability of packages, espe- 
cially iMBC+06l] . Further work in that direction investigated not only the installability 
of a singe package, but to termine sets of co -installa bl e packa ges ||cVll[] . and in finding 
good choices for upgrading an installation IICZTOSll . iTLOldl . 

Recent unpublished work by Jerome Vouillon based on is also able to solve the 

migration problem while enforcing the stronger requirement that packages that were 
co-installable testing before are still co-installable afterwards. Optionally, this require- 
ment can be relaxed, so the testing migration problem as described here can also be 
solved. Their implementation beforms better than ours. The main difference is that 
our approach finds a tractable and easily understandable encoding in SAT and uses 
off-the-shelve solvers, while their tool, written in OCaML, applies sophisticated trans- 
formation of the package relations, identifying equivalent packages and solving the 
resulting smaller problem without the help of external tools. From a users' point of 
view, our tool provides nothing over their tool. 
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8 Conclusion and further work 



We have shown the feasiblity of solving the testing migration problem using off-the- 
shelve SAT solvers, and empirically verifyied the usefulness of the approach, applying 
it to the large package repositoriy created by the Debian project. 
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Although the current state of the program yields usable results, we expect that further 
reductions in the SAT problem size are possible. A considerably faster tool would al- 
low interactive use, which can assist the distribution maintainers in finding out why a 
certain package does not migrate. 

Currently, conflicts are considered as relevant for package which one would not expect. 
For example, the packages f ile-rc provides and conflicts with the common package 
sysv-rc, which appears in the transitive dependency closure of many packages (more 
than 8000). A sound criteria that would render such a conflict irrelevant for most pack- 
ages would considerably reduce the size of the SAT instance. 

Similarly, if there is a package p' G D*{p) that is independent from p in the sense that 
all edges leaving D*[p') in the graph of dependencies and conflicts on D*(p) are inci- 
dent to p' , then any conflicts in D*{p') can be removed from Cr{p). It remains to be 
investigated if deciding this condition takes less time than is saved afterwards. 
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